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ABSTRACT 

Students* evaluations of teacher effectiveness are 
explored as a multidimensional construct, emphasizing the Students' 
Evaluations of Educational Quality (SEEQ) instrument developed by H. 
W. Marsh (1987). An overview is presented of studies in which Marsh 
evaluated longitudinal data from an archive of responses to nearly 1 
million SEEQ instruments representing 50,000 courses collected over 
13 years. The focus is on a cohort of nearly 200 teachers evaluated 
over the period The analysis of studies considers: (1) the 
generalizability of the SEEQ factor structure; (2) higher order 
factor structure; (3) the generalizability of ratings over time; (4) 
models of covariance stability; and (5) analyses of teacher profiles. 
Analyses provided clear support for the generalizability of the SEEQ 
factor structure over time, over courses, and over teaching at 
different levels. Higher order factor analyses suggest that higher 
order factors that might underlie SEEQ factors are not particularly 
useful in providing a smaller number of scores with which to 
summarize SEEQ responses. Each instructor was found to have a 
reasonably distinct profile. Mean ratings of the same teachers 
evaluated consistently showed no systematic increases or decreases. 
An appendix contains the SEEQ instrument. Four tables and three 
figures present study data. (SLD) 
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INTRODUCTION 

This paper examines students' evaluations of teaching effectiveness (SETs) as a 
multidimensional construct and emphasizes the Students' Evaluations of Educational Quality 
(SEEQ; Marsh, 1987) instrument. Previous research demonstrates that class-average SETs are: 
1) multidimensional; 2) reliable and stable; 3) primarily a function of the instructor who teaches 
a course rather than the course that is taught; 4) relatively valid against a variety of indicators of 
effective teaching; 5) relatively unaffected by a variety of variables hypothesized as potential 
biases; and 6) seen to be useful by faculty as feedback about their teaching, by students for use in 
course selection, and by administrators for use in personnel decisions. 

Despite the huge volume of SET research, most research has considered ratings collected 
in one specific course on a single occasion and there is surprisingly little longitudinal research 
that considers the ratings of the same teachers over an extended period of time. For example, 
inferences about how SETs are related to teacher age are typically based on cross-sectional 
studies. Cross-sectional studies, however, provide a poor basis for inferring what ratings 
younger teachers will receive later in their careers or what ratings older teachers would have 
received if evaluated earlier in their careers. Clearly, there are important limitations in the use of 
cross-sectional data for evaluating how ratings of the same instructor varies over time. 

In the present investigation I present an overview of studies in which I evaluated 
longitudinal data derived from an archive of responses to nearly 1 million SEEQ instruments 
representing 50,000 courses that have been collected over a 13-year period of time. I begin by 
briefly reviewing support for the generalizability of the SEEQ factor structure. For purposes of 
the longitudinal analyses I focus on a cohort of nearly 200 teachers who were evaluated 
continuously over the 13-year period. In different studies I examine three perspectives of the 
question of how ratings of the same instructors vary over time. 

1. How well does the SEEQ factor structure generalize across teaching at different levels 
and in different disciplines (Marsh & Hocevar, 1991a; ; also see Marsh & Roche, in press)? This 
question is addressed by the comparison of 22 factor analyses of ratings of unique groups of 
teachers that vary in terms of academic discipline (e.g., psychology, Spanish, engineering) and 
level (e.g., undergraduate and graduate level courses). In related research I have also examined 
second-order factors (Marsh, 1991b, 1991c). 

2. How do the mean ratings of the same set of teachers vary over time (Marsh & Hocevar, 
1991b)? The question here is whether the mean ratings for the longitudinal cohort of teachers 
systematically increase or decrease over time. Contrary to the results from reviews of cross- 
sectional studies, this analysis shows that the mean ratings of these teachers are stable over time. 

3. How highly related are ratings from different occasions over the 13-year period 
(Marsh, 1991)? Here I looked at the test-retest correlations from one year to the next and for 
longer periods of time. The results showed that test-retest correlations were high for short 
periods of time and were nearly as high for much longer periods of time. Different theoretical 
models positing a "simplex" pattern of growth and single latent construct with no systematic 
change were evaluated. 

4. How stable is the profile of SEEQ factors for the same teacher over extended periods 
of time (Marsh & Bailey, 1991)? For any particular set of ratings, ratings for one scale (e.g., 
Enthusiasm) will be higher or lower than another scale (e.g., Organization). Here I consider 
whether the profile of SEEQ scores for the same teacher generalizes over courses, over course 
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levels (graduated and undergraduate classes), and over the 13-year period during which the 
ratings were collected. These results show that the profile of SEEQ scores is also very stable 
over time, more stable apparently than even the overall ratings. 




The selection and revision of the SEEQ items was based on literature reviews, student 
and teacher responses about the importance of items, teacher responses about the usefulness of 
items, examination of open-ended comments by students, and psychometric properties of the 
responses, thus supporting the content validity of SEEQ responses. Factor analytic support for 
the SEEQ scales is particularly strong. To date, more than 30 published factor analyses of SEEQ 
responses have identified the 9 factors that SEEQ is designed to measure. 

Insert Tables 1 and 2 About Here 

Marsh and Hocevar (199 la) described the archive of SEEQ responses that contains 
ratings of 50,000 classes (representing responses to nearly 1 million SEEQ surveys). From this 
archive, 24,158 courses were selected and classified into one of 21 different subgroups (see 
Table 1) varying in terms of teacher rank (teaching assistant or regular staff), level of instruction 
(undergraduate or graduate), and academic discipline. Twenty-two separate factor analyses of the 
total sample (see Table 2) and each of the subsamples all identified the 9 factors that SEEQ is 
designed to measure, providing very strong support for the generality of the factor structure 
underlying SETs. 

For each course, two sets of factor scores were derived: one based on the factor analysis 
of the total sample of 24,158 courses and one based on the specific subsample (of the 21 
subsamples) to which the course was classified. These two sets of factor scores were correlated 
in each of the 21 different subgroups. High correlations among factor scores representing the 
same factor provide support for the comparability of the different factor structures. Nearly all of 
the 189 correlations (9 SEEQ factor x 21 subsamples) were greater than .95 and the majority 
were larger than .99. 

Because of the psychometric properties of the SEEQ instrument and because of the size 
and diversity of the data base considered here, the results provide stronger support for the 
generality of the factor structure underlying SETs than does any previous research. 



" - HlOHER-ORDER FACTOR STRUCTURES 



SEEQ clearly measures distinct dimensions of teaching effectiveness, but some 
researchers argue that SETs can be explained by one or a relatively few number of higher-order 
factors that incorporate distinct first-order factors. I (Marsh, 1991a) examined this possibility in 
a higher-order factor analyses based on an application of confirmatory factor analysis (HCFA). 
Based on previous research, different models were posited that had 9 first-order factors and either 
1, 2, 3, or 4 higher-order factors. The model with 9 first-order factors and 4 higher-order factors 
fit the data the best. Even this model was not entirely satisfactory in that much of the true score 
variance in the first-order factors could not be explained in terms of the higher-order factors. 
The study demonstrates that the SEEQ responses cannot be explained adequately by one or even 
a few summary scores and illustrates the application of hierarchical confirmatory factor analysis. 

Insert Figure 1 About Here 



GEN1IRALIZABILITY OF RATINGS OVER TIME 



The two most common approaches to the study of stability and change refer to the 
stability of means over time (mean stability) and to the stability of individual differences over 
time (covariance stability or test-retest correlations). In this section I focus primarily on mean 
stability. 

In an early review of research based largely on primary and secondary teaching, Ryans 
(1960) reported an overall negative relation between teaching experience and teaching 
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effectiveness. He suggested an initial increase in effectiveness during the first few years, a 
leveling out period, and then a period of gradual decline. In her review of research since the 
early 1960s, Barnes (1985) reached a similar conclusion. At the university level, Feldman (1983) 
reviewed studies relating overall and content-specific dimensions of SETs to teacher age, 
teaching experience, and academic rank. He reported that SETs were only weakly related to these 
three measures of seniority, but that distinct patterns were evident Overall evaluations tended to 
be negatively correlated with age and — to a lesser extent — years of teaching experience, but 
tended to be positively correlated with academic rank. Thus, younger teachers, teachers with less 
teaching experience, and teachers with higher academic ranks tended to receive somewhat higher 
evaluations. Age and teaching experience showed reasonably similar patterns of correlations with 
overall and content-specific dimensions. Academic rank, however, tended to be positively 
correlated with some characteristics such as subject knowledge, intellectual expansiveness, and 
value of course materials, but negatively correlated with other characteristics such as class 
discussion, respect for students, helpfulness and availability to students. Consistent ^vith the 
reviews by Ryans (1960) and Barnes (1985), Feldman noted that in the few studies that 
specifically examined nonlinear relations, there was some suggestion of an inverted U-shaped 
relation in which ratings improved initially, peaked at some early point, and then declined slowly 
thereafter. 

As I noted earlier, there are important limitations in the use of cross-sectional data for 
evaluating how ratings of the same instructor varies over time. For this reason, I examined 
changes in ratings of a large number of teachers who had been evaluated continuously over a 13- 
year period with SEEQ. Using the SEEQ archive I selected all teachers who were evaluated at 
least once during each of 10 different years over the 13 year period the ratings were collected. 
This process identified 195 different teachers who had been evaluated in a total of 6024 different 
courses (an average of 30.9 classes per teacher) from a total of 31 different academic 
depa r trnents. A multiple regression approach to ANOVA was used in which linear and nonlinear 
effects of year (1976-1988), course level (2 = graduate, 1 - undergraduate), and their interaction 
were evaluated. Whereas graduate level courses tended to be rated higher than undergraduate 
level courses, the effect of year - the major focus of this analysis — was consistently small. 
First-order correlations and standardized beta weights reflecting changes in ratings over time 
varied from -.067 to +.016 for the 9 SEEQ factors and the two overall rating items. 

The most important influence in the SETs is the instructor. In order to evaluate the 
influence of the instructor the mean rating of each instructor over all undergraduate classes and 
over all graduate classes was computed. In the main regression models considered, these 
instructor mean ratings were included along with the linear and nonlinear components of the year 
(1976-1988), the course level, and their interactions. Hence, the effects of the individual 
instructor were controlled in evaluating the effects of the other variables. 

Insert Table 3 About Here 

The individual instructor accounted for most of the variance in each of the different 
SEEQ scores (Table 3). Because the instructor effect reflects ratings of the same teacher over 
time, this demonstrates the covariance stability (test-retest correlations) of the SETs. There were 
almost no systematic changes in the mean rating across all teachers over time, thus supporting 
the mean stability of the ratings. Year accounted for no more than 1/4 of 1% in any of the 
evaluation scores, and — despite the large N and powerful design — only reached statistical 
significance for 2 of 1 1 scores. Supplemental analyses suggested that the standards that students 
used apparently did not change over this period. The nonlinear effects suggested from cross- 
sectional studies were not observed for either the total sample, or subsamples of teachers with 
little, intermediate, or substantial amounts of teaching experience at the start of the 13-year 
longitudinal study. These results are important because this is apparently the only study to 
examine the mean stability of faculty ratings using a longitudinal design with a large and diverse 
group of teachers over such a long period of time. The results also demonstrate the potential 
dangers in trying to infer changes of over time from cross-sectional data instead of true 
longitudinal data. 

MOMLSOF COY ARIANCE STABILITY 



The purpose of analyses described in this section is to evaluate models of covariance 
stability and change. Data are SEEQ responses from a cohort of 157 teachers who had been 
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evaluated in at least one course during each of at least 7 years during an 8-year period. Separate 
analyses were conducted on the 7 SEEQ scales measured by 4 items (the other 2 scales are 
measured by 2 or 3 items). 

Insert Figure 2 

The simplex model is apparently the most widely used approach for analyzing covariance 
stability in longitudinal data. The critical feature of the simplex model is that correlations 
between measures collected at adjacent occasions are highest and that sizes of correlations 
decrease steadily as a function of the number of occasions separating two measures. A viable 
alternative is a simple "one factor" model in which ratings of the same SEEQ scale at different 
points in time reflects a stable "true" score and an error component. The simplex model implies 
systematic gradual changes over time such that the longer the intervening period of time the 
lower the test-retest correlations whereas the one factor model does not necessarily imply any 
systematic change. Inspection of correlations between ratings on different occasions, however, 
suggests that the simplex model may be wrong. Thus, for example, ratings of Learning/Value in 
1980 and 1981 correlated .58 whereas correlations between ratings in 1980 and 1987 were .56. 

Simplex and one-factor models were both evaluated with single-indicator models (the 
mean of the 4 items used to infer each scale) and multiple-indicator models (the four items). In 
both cases, the one-factor models fit the data better than the simplex model. The results challenge 
a potential over-reliance on the simplex model. The results also suggest that the assumption of a 
systemic, gradual change in SETs of the same teacher over time is apparently false. 




Thus far, I have considered only the generalizability of individual SEEQ scales. I noted 
(Marsh, 1987), the need to examine profiles of SEEQ scores as well as the individual scales that 
make up the profile. More specifically I suggested that each instructor has a distinguishable 
profile of SEEQ scales (e.g., high on organization and low on enthusiasm) that generalizes over 
different course offerings and is distinct from the profiles of other instructors. Because 
apparently no other research known has evaluated SET profiles in this manner, I conducted a 
profile analysis of ratings selected from the longitudinal SEEQ archive. In this study I considered 
3079 sets of class-average responses for 123 instructors — an average of 25 classes per instructor 
- who had been evaluated regularly over a 13-year period for both graduate and undergraduate 
courses. Because there were so many sets of ratings for each instructor, it was possible to 
determine a characteristic profile of SEEQ scores for each instructor. In profile analyses, it is 
important to distinguish between the level of scores (whether an instructor consistently receives 
high or low ratings) and the shape of the profile (e.g., relatively higher on organization and 
relatively lower on enthusiasm). 

The profiles of four teachers (figure 3) illustrate the "level" and "shape" comparisons that 
are the focus of the profile analysis. Each profile is the average rating across all sets of ratings of 
the same teacher collected during the 13-year period. Instructors 1 and 2 have generally higher 
ratings than instructors 3 and 4, demonstrating the effect of level. The effect of shape can be 
seen by comparing the Enthusiasm and Organization scores for the different teachers. Instructors 
1 and 3 have consistently higher ratings for Organization than Enthusiasm, whereas teachers 2 
and 4 have consistently higher ratings for Enthusiasm than for Organization. 

Insert Figure 3 

In the ANOVA model used to evaluate profiles, I considered the main effects of 
instructors, course level, instructor by course-level interaction, and the extent to which these 
varied depending on the particular SEEQ component. These main effects reflect the effects 
averaged across the 9 SEEQ scales. 

The very large effect of the instructor is an important finding. The eta squared value of 
.371 is equivalent to an average correlation of .61 between ratings of the same instructor 
(averaged across all SEEQ factors) on different course offerings. This is consistent with my 
earlier (Marsh, 1981) finding that the average correlation was .71 across two offerings of the 
same course and .52 across ratings of different courses. 

Although not a main focus the present investigation, the results again show that graduate 
courses are evaluated somewhat more highly than undergraduate courses. The statistically 
significant, although modest instructor by course-level interaction suggests that some teachers 
ERXC get s y stematicall y Wgher ratings in graduate level courses whereas other teachers get higher 
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ratings in undergraduate courses. This interaction effect, however, is much smaller than the main 
effect of instructor. 

The most important effect in these analyses is the extent to which the effect of instructor 
varies as a function of the particular SEEQ factor. If this interaction is small, it suggests that 
highly rated teachers receive consistently high ratings across all SEEQ factors and that poorly 
rated teachers receive consistently poor ratings across all SEEQ factors. In fact, however, the 
effect of the instructor varies systematically with the SEEQ factor. Of the total variation in 
profiles in 3,000 classes, nearly 50% is due to the specific instructor. This substantial effect 
clearly demonstrates that the profiles associated with each instructor are consistent across 
different course offerings by that instructor, and distinct from the profiles of other instructors. 
This finding is the most important result of the Marsh and Bailey (1991) study. 

The profile of 9 SEEQ scales (e.g., Enthusiasm, Organization, Group Interaction) for each 
instructor was shown to be distinct from the profiles of other instructors, generalized across 
course offerings over the 13-year period, and generalized across undergraduate and graduate 
level courses. This support for the existence of a distinct profile that is specific to each instructor 
has important implications for the use of SETs and opens up new, largely unexplored areas for 
further research. For example, validity research typically focuses on the relations between 
external validity to either overall SETs, or, sometimes, to specific SET scales. Because there are 
reliable individual differences in SET profiles, these results imply that researchers should also 
consider how different profiles are related to external criteria of effective teaching. For example, 
it may be that student learning is maximized when both Enthusiasm and Organization are higher, 
whereas being high on just one or the other is insufficient. Similarly, the demonstration of 
systematic, reliable individual differences in SET profiles supports the use of profiles in 
formative feedback and summative evaluations, and has important implications for the study of 
teaching and teaching styles. The results also provide further support for the multidimensionality 
of SETs. 

• t : ■ • - 



This set of studies has important theoretical and practical implications for the use of 
students evaluations. The unique strength of the set of studies is the large number of classes 
evaluated with the same SEEQ instrument over such a long period of time. The series of factor 
analyses provided clear support for the generalizability of SEEQ factor structure over time, over 
courses in different disciplines, and over teaching at different levels. Higher-order factor 
analyses suggested that whereas there may be higher-order factors underlying the 9 SEEQ 
factors, these factors were apparently not particularly useful in providing a smaller number of 
scores with which to summarize SEEQ responses. Consistent with these findings, the profile 
analysis indicated that each instructor has a reasonably distinct profile of SEEQ scores that 
generalizes over time and across courses taught at both graduate and undergraduate levels. 

In two studies I looked at the stability of mean differences and the covariance stability of 
individual differences. In contrast to suggestions based on reviews of cross-sectional studies, the 
mean ratings of the same cohort of teachers who were evaluated consistently over a 13-year 
period showed no systematic increases or decreases. This study also indicated that a substantial 
portion of the variance in the ratings could be explained in terms of the teacher who taught the 
course. In the study of covariance stability, simplex models that posit systematic changes in the 
ratings over time were not able to fit the data very well, whereas a good fit was found with a 
simple one-factor model positing that ratings for each year reflect a single stable factor that 
generalizes over time. 
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Appendix 

The Students' Evaluations of Educational Quality (SEEQ) Instrument. Interested individuals are 
invited to use the SEEQ freely for their own personal use. Parties interested in using SEEQ on a 
wider basis are requested to obtain permission from Herbert W. Marsh, which will be freely 
given. This will allow the application and performance of SEEQ to be monitored. Some 
"Student and Course Characteristics," labelling, and, perhaps, aspects of the instructions may be 
changed as necessary, but in other respects the SEEQ instrument should be left intact. 
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"•Tut Instructor's Name, Department Name and Course Number at top 
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Table 1 (Reproduced From Marsh & Hocevar, 1991a) 

Summa ry of the 21 Subsamples of Courses 
No. of classes Academic unit 



Undergraduate courses taught by teaching assistants 

1. 431 General 
2- 610 Business 

j 3. 565 Humanities 

j 4. 1606 Social Sciences 

! 5. 683 Spanish and Portuguese 

6 6. 1368 Economics 

7. 902 Communication 

Undergraduate courses taught by regular faculty 
I 1421 Genera! 

2. 2326 Business 

3. 956 Humanities 

4. 2320 Social Sciences 

5. 1693 Engineering 
6- 590 History 
7. 538 Psychology 

Graduate courses taught by regular faculty 
757 General 

2- 2049 Business 

3- 1157 Social Sciences 

4- 957 Engineering 
\- 1213 Education 

Systems Engineering 
Safety and Systems Management 



6. 457 

7. 1559 



Total 

24,158 



Note For present purposes all classes with six or more sets 
subsample had at least 400 classes. All classes were first 

C bv e t 8 e 0 a r ch ed 8enera ' gr ° UpS C ° nsisti ^ of 2L£ taugh 
rLtr f^ g .f SSISta 7 tS ' under 8^duate classes taught by 
regular faculty, and graduate courses taught by renilar 

^ y SociaK S WCre C ' aSSified int ° d -s,ons o'r sclol 
deLrfZ S . C,ences D or Engineering) and then into specific 
departments (e.g., Psychology or Systems Engineering) 

classified into one and only one subsample. 
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Table 2 (Reproduced From Marsh & Hocevar, 1991a) 

Fa ctor Analysis Resul ts for the Total Sample of 24, 158 Se ts of Class-average Res ponses: Factor Loadings and Factor Correlations 
SEEQ scales and items (paraphrased) 



SEEQ factors 
Lrn Enth 


Orgn 


Grp 


Ind 


Brd 


Pvq m 




Work 


.434 


.168 


.103 


.015 


.014 


.159 


.099 


.155 


.291 


.607 


.083 


. 100 


.026 


.050 


.103 


.085 


.147 


.113 


.646 


.078 


.034 


.039 


.058 


.169 


.074 


.131 


.020 


.487 


.043 


.176 


.152 


.045 


.047 


.112 


.149 


- .217 


.410 


.211 


.173 


.041 


.042 


.085 


.166 


.175 


.069 


.095 


.544 


.129 


.072 


.195 


.115 


.052 


.069 


.025 


.064 


.714 


.094 


.059 


.085 


.083 


.069 


.071 


.042 


.0X9 


.650 


-.023 


.103 


.078 


.129 


.090 


.054 


-.045 


.137 


.581 


.187 


.131 


.026 


.050 


.1 10 


.073 


.017 


.172 


.392 


.245 


.083 


.141 


.096 


.140 


.075 


.039 


.146 


.165 


.510 


.176 


060 


.075 


.079 


.104 


-.072 


.069 


.087 


.677 


.060 


.075 


.073 


.094 


.118 


.005 


.128 


.026 


.529 


.055 


.070 


.065 


.175 


.184 


.024 


.031 


.040 


.589 


-.093 


.049 


.175 


.146 


.044 


.020 


.058 


.10.3 


.011 


.769 


.070 


.033 


.067 


.080 


.002 


.066 


.049 


-.015 


.797 


.095 


.093 


.048 


.073 


-.029 


.059 


.105 


. 167 


.583 


.151 


.094 


.100 


.080 


.001 


.045 


.069 


.035 


.674 


.182 


.110 


.094 


.070 


-.013 


.051 


.163 


-.001 


.176 


.612 


.063 


.112 


.057 


-.038 


: .042 


.059 


.061 


.078 


.786 


.036 


.093 


.059 


-.007 


.086 


.140 


.001 


.137 


.647 


.057 


.138 


.059 


.004 


-.014 


-.028 


.139 


.037 


.636 


.099 


.136 


.104 


.010 


.043 


.037 


< to 

.118 


.059 


.068 


.676 


.077 


.109 


.065 


.087 


.085 


. 134 


.020 


.044 


.662 


.056 


.122 


.004 


.035 


.066 


.086 


.123 


.101 


.636 


.097 


.113 


-.004 


.10 1 


111 

.113 


.018 


.086 


.039 


.562 


.084 


.040 


.000 


.034 


.039 


.111 


.047 


.101 


.028 


.670 


.088 


.044 


.047 


.044 


.011 


.043 


.107 


.078 


.749 


.099 


-.033 


.063 


.036 




.UJ4 


.064 


.047 


.643 


. 146 


-.026 


-.008 


-.004 


.019 


.022 


.018 


.053 


.025 


.555 


-.003 


.127 


.021 


.036 


.027 


.039 


.012 


.140 


.716 


.072 


-.028 


.030 


.051 


-.059 


-.017 


.096 


.015 


.018 


.861 


.100 




.004 


.085 


-.001 


.002 


-.035 


.038 


.907 


-.098 


.101 


.055 


-.099 


.005 


-.001 


.035 


.040 


.689 


.148 


-.044 


-.085 


.034 


-.001 


-.006 


-.006 


.042 


.798 


1.000 


















.434 


1.000 
















.407 


.427 


1.000 














.350 


.364 


.^1U 


1.000 












.263 


.400 


.331 


.455 


1.000 










.449 


.419 


.454 


.327 


.352 


1.000 








.401 


.392 


.511 


.315 


.493 


.403 


1.000 






.488 


.319 


.431 


.312 


.338 


.418 


.510 


1.000 




.128 


.076 


.044 


-.072 


-.009 


.106 


.033 


.154 


1.000 



ERIC 



Learning/Value 

Course challenging & stimulating 

Learned something valuable 

Increase subject interest 

Learned & understood subject matter 

Overall course rating 
Instructor Enthusiasm 

Enthusiastic about teaching 

Dynamic and energetic 

Enhanced presentation with humor 

Teaching style held your interest 

Overall instructor rating 
Organization/Clarity 

Lecturer explanations clear 

Materials well explained & prepared 

Course objectives stated & pursued 

Lectures facilitated taking notes 
Group Interaction 

Encouraged class discussion 

Students shared knowledge/ideas 

Encouraged questions & gave answers 

Encouraged expression of ideas 
Individual Rapport 

Friendly towards individual students 

Welcomed students seeking help/advice 

Interested in individual students 

Accessible to individual students 
Breadth of coverage 

Contrasted various implications 

Gave background of ideas/concepts 

Gave different points of view 

Discussed current developments 
Examinations/Grading 

Examination feedback valuable 

Evaluation methods fair/appropriate 

Tested course content as emphasized 
Assignments/Readings 

Readings/texts were valuable 

They contributed to understanding 
Workload/Difficulty 

Course difficulty (easy-hard) 

Course workload (light-heavy) 

Course pace (slow-fast) 

Hours per week outside of class 
Factor pattern correlations 

Learning/Value 

Instructor Enthusiasm 

Organization/Clarity 

Group Interaction 

Individual Rapport 

Breadth of Coverage 

Examinations/Grading 

Assignments/Readings 
^Worklo ad/Difficulty 

Note. Target loadings, the factor loadings items designed to define each SEEQ factor, are presented in italics. 
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Table 3 (Reproduced From Marsh & Hocevar, 1991b) 



Changes in Multiple Dimensions of Students Evaluations Over Time for Ratings 
of the Same instructor: The Effects of Instructor, Year (1976-1988), Level 
(undergraduate and graduate) , and their Interaction (N=3135) . 



Dimens ion 



Standardized Beta Weights For: 



r for 



Instr instr Year Year2 Level YrxLev YR2xlev Mult R 



Factor Scores 



Learning/Value 


.701** 


.703** 


.001 


-.045** 


-.023 


.018 


.025 


.703** 


Enthusiasm 


.822** 


.622** 


-.016 


-.019 


-.003 


.010 


.006 


.822** 


Organization 


.770*'. 


.770** 


-.048** 


-.025 


.000 


.017 


- .004 


.772** 


Group Interact 


.814** 


.815** 


-.012 


-.020 


-.009 


-.013 


.010 


.815** 


Indiv Rapport 


.747** 


.746** 


-.026 


.016 


.006 


.006 


-.009 


.748** 


Breadth 


.735** 


.735** 


.005 


-.011 


.000 


.009 


-.007 


.736** 


Exams 


.678** 


. 678** 


-.028 


-.017 


.006 


-.008 


-.014 


.678** 


Assignments 


.704** 


.704** 


-.004 


-.024 


-.008 


.012 


.006 


.704** 


Workload 


.797** 


.797** 


-.020 


-.009 


.010 


-.009 


.007 


.797** 



Overall Ratings 



Course 
Instructor 



.725** .725** -.031 -.028 -.013 
.756** .755** -.048** -.020 -.010 



-.031 
.009 



.019 
.015 



.726** 
.758** 



Note^ The Instructor (instr) component was obtained by taking the mean of the 
instructor ratings for undergraduate classes and for graduate classes, and then 
including these means in the prediction of ratings. Because these means were 
computed separately for graduate and undergraduate level courses, it has the 
effect of eliminating variance due to course level. 



p < .05/ ** p < .01, 
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Table 4 (Reproduced From Marsh & Bailey, 1991) 



Univariate Repeated Measure* and MANOVA Analyses of SESQ Profiles 



Repeated Measures Analysis 



MANOVA 



Source 



SS 



Between (Total Sooraa) 



df 



Kffact Sixes 

2 2 Wilks' Hypoth Error Mutivar- 

ata omega Lambda df df lata F 



Instructor (I) 


3894 


122 


.371 


.349 










Count Laval (L) 


166 


1 


.016 


.016 










I X L 


934 


122 


.089 


.067 










Error Between 


5493 


2833 














Total Bet wean 


10488 


3078 














ithin (Profiles) 


















Scales X X 


8099 


* 976 


.470 


.451 


.0050 


976 


22594 


23.60 


Scales x L 


108 


8 


.006 


.006 


.8923 


8 


2826 


42.65 


Scales x I x L 


1195 


976 


.069 


.050 


.3403 


976 


22594 


3.31 


Error Within 


7812 


22664 














Total Within 


17214 


24632 















Hot a. The repeated measures and MANOVA approaches for the Total Scores (i.e., 
the Between Croups portion) are equivalent since there is only one dependant 
measure. For both approaches the 9 SEEQ scores were transformed into 8 
difference scores between adjacent SEEQ factors — the standard "repeated 1 * 
transformation in SPSS (1988) . Whereas the ordering of the SEEQ scores is 
arbitrary, the results in no way depend on the particular ordering used (see 
Tabachnic* a Fidell, 1989 # for further discussion) . 

a The Creenhouse-Geisser, Huyhn-Feldt, and Lower-bound Epsilons were 0.872, 
0.950, and 0.125 respectively. Even when the Lower-bound Epsilon that is known 
to be maximally conservative was used , all tests of statistical significance 
were significant at p < .001. 
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Figure 1 (Reproduced From Marsh, 1991b) 



L£AO*JG 





MODEL HI 



MODEL H3 





MODEL H2 



MODEL H4 



Figure I foppouti') Four a prion higher-order models of relations among first-order Students' 
Evaluations of Education Quality (SEEQ) factors (Each first-order factor is inferred from multiple 
CD 9^icators based on the design of the SEEQ. To avoid clutter, the multiple indicators of each first-order 
E_.^ [or anc * corre l a uons among higher-order factors are not presented.) | r: 



Figure 3 (Reproduced From Marsh & Bailey, 1991) 

Instructor 1 (n= 24 classes) Instructor 2 (n= 35 classes) 
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LEAJW CMIHi CftCSN CROUP INWV BROW EXAilS ASSC TOfiX ^DmuO^^^eROmEXAUSASSCWORK 
Figure 1. Profiles of nine SEEQ scales for 4 instructors (see Appendix for more detail on the SEEQ scales). All scores 
were standardized (mean « 0, SD = 1) across all sets of ratings used in the study. Each profile represents the mean 
score for each SEEQ scale (the boxes), averaged across *U the classes for that instructor. Thus, for example, all scores 
above the line representing a z-score of zero reflect ratings that are above average. Also presented for each scale, is the 
range of scores corresponding to the mean plus and minus one standard deviation (based on the set of ratings for the 
particular instructor for thai particular scale). J Q 



